Distributed IR for Digital Libraries
نویسنده
چکیده
This paper examines technology developed to support largescale distributed digital libraries. We describe the method used for harvesting collection information using standard information retrieval protocols and how this information is used in collection ranking and retrieval. The system that we have developed takes a probabilistic approach to distributed information retrieval using a Logistic regression algorithm for estimation of distributed collection relevance and fusion techniques to combine multiple sources of evidence. We discuss the harvesting method used and how it can be employed in building collection representatives using features of the Z39.50 protocol. The extracted collection representatives are ranked using a fusion of probabilistic retrieval methods. The effectiveness of our algorithm is compared to other distributed search methods using test collections developed for distributed search evaluation. We also describe how this system in currently being applied to operational systems in the U.K.
منابع مشابه
Co-operative Information Retrieval in Digital Libraries
In recent years, the great expansion of distributed resources through for example the Internet has set-up a framework for the realisation of an age-old vision: gaining firsthand immediate access to vast amounts of information. The concept of a digital library is a step towards the realisation of this vision, and can be regarded (from a computer science perspective) simply as a distributed infor...
متن کاملContent-Based Community Formation in Hybrid Peer-to-Peer Networks
We present a community formation method for improving IR performance in a distributed digital library network. Following the digital library literature, we model digital libraries as leaf nodes and regional directory services as ultrapeers in the hybrid P2P architectures. We conceptualize a regional directory service and its local network of digital libraries as a community, and introduce a com...
متن کاملReport on the TREC-8 Experiment: Searching on the Web and in Distributed Collections
The Internet paradigm permits information searches to be made across wide-area networks where information is contained in web pages and/or whole document collections such as digital libraries. These new distributed information environments reveal new and challenging problems for the IR community. Consequently, in this TREC experiment we investigated two questions related to information searches...
متن کاملشاخص های طراحی و ارزیابی کتابخانه های دیجیتالی
Introduction: There was always suspicion regarding concept and frameworks of digital libraries concepts such as electronic library, virtual library, without wall library, hybrid library and digital library have applied often together, or for each other for conveying library concept. Studies have shown that so far there is no standard and universal accepted definition for digital libraries, howe...
متن کاملInformation Retrieval on the Internet
The main components of a search engine are the Web crawler which has the task of collecting webpages and the Information Retrieval system which has the task of retrieving text documents that answer a user query. In this chapter we present approached to Web crawling, Information Retrieval models, and methods used to evaluate the retrieval performance. Practical considerations include information...
متن کاملInformation Retrieval and Digital Libraries
Chapter Overview The field of information retrieval (IR) is generally concerned with the indexing and retrieval of knowledge-based information. Although the name implies the retrieval of any type of information, the field has traditionally focused on retrieval of text-based documents, reflecting the type of information that was initially available by this early application of computer use. Howe...
متن کامل